Introduction to the CoNLL-2000 Shared Task Chunking

نویسندگان

  • Erik F. Tjong Kim Sang
  • Sabine Buchholz
چکیده

We describe the CoNLL-2000 shared task: dividing text into syntactically related nonoverlapping groups of words, so-called text chunking. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Chunking by Means of Lexical-Contextual Information in Statistical Language Models

In this work, we present a stochastic approach to shallow parsing. Most of the current approaches to shallow parsing have a common characteristic: they take the sequence of lexical tags proposed by a POS tagger as input for the chunking process. Our system produces tagging and chunking in a single process using an Integrated Language Model (ILM) formalized as Markov Models. This model integrate...

متن کامل

Hybrid Text Chunking

This paper describes a HMM-based chunk tagger and its extensions used in KRDL for the shared task of CoNLL'2000. Compared with standard HMM-based tagger, this tagger incorporates more contextual information into a lexical entry. Moreover, an error-driven learning approach is adopted to decrease the memory requirement. It keeps only positive lexical entries which contribute to the error reductio...

متن کامل

Rule-Based Chunking and Reusability

In this paper we discuss a rule-based approach to chunking implemented using the LT-XML2 and LT-TTT2 tools. We describe the tools and the pipeline and grammars that have been developed for the task of chunking. We show that our rule-based approach is easy to adapt to different chunking styles and that the mark-up of further linguistic information such as nominal and verbal heads can be added to...

متن کامل

A Robust Risk Minimization based Named Entity Recognition System

This paper describes a robust linear classification system for Named Entity Recognition. A similar system has been applied to the CoNLL text chunking shared task with state of the art performance. By using different linguistic features, we can easily adapt this system to other token-based linguistic tagging problems. The main focus of the current paper is to investigate the impact of various lo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000